-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(traces): add reranking span kind for document reranking in llama index #1588
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python code lgtm. Any reason to keep the test you added as an integration test, or can it be made into a unit test?
Let's get a review from @mikeldking for the frontend.
Looks awesome @RogerHYang ! |
we need a followup for langchain too |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I filed some follow-up tickets. Might do a quick audit on the UI post any changes but looks good!
app/src/pages/trace/TracePage.tsx
Outdated
<CodeBlock value={query} mimeType="text" /> | ||
</Card> | ||
<Card | ||
title={`Input Documents (${numInputDocuments})`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a titleExtra
prop on card where you can place a Counter
component. https://5f9739a76e154c00220dd4b9-zeknbennzf.chromatic.com/?path=/story/counter--gallery
app/src/pages/trace/TracePage.tsx
Outdated
title={`Re-ranked Documents (${numOutputDocuments})`} | ||
{...defaultCardProps} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - rely on titleExtra and counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app/src/pages/trace/TracePage.tsx
Outdated
{input_documents.map((document, idx) => { | ||
return ( | ||
<li key={idx}> | ||
<DocumentItem document={document} /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might re-color these just so that there's a visual hierarchy of color (e.g. that re-ranked documents take on a different tint) - this way as you are clicking around you can clearly see the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using the DocumentItem
component is good but I think just showing the new score
label might be a tad confusing? Or just using score
as an abstract is intended here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one case it's often spacial distance where as when running through a reranker it is a relevance rank. Just thinking that from a user's perspective displaying score: XX
alongside both we lose a bit of an opportunity to explain the score
in this context a bit better - score
being pretty generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even though score
is generic, it is still accurate. On the input side of the reranker, score
may or may not exist, and even if it does exist, it's not considered by the reranker. But if the "input" score
does exist it was generated by a preprocessor for a separate purpose. The general mental picture here is that there could be millions of documents in a corpus, and only a relatively small set are chosen to be reranked, and that selection process can have a score
of its own based on the query in question. Even though that score
is not meaningful to the reranker, it is still an informative attribute of the input document, because it relays the reason for how the document became a candidate in the first place (especially when the preprocessor is missing in the trace). On the other hand, we can't really get more specific that the score
verbiage because we don't have more information. On balance, although it may seem confusing at first, a user should have enough context to reason their way through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wasn't disputing the way we capture the score - was just thinking of ways to avoid the mental "eason their way through it." a bit. But I don't have an immediate good prefix for the reranker score so let's keep it for now.
RERANKING_INPUT_DOCUMENTS, | ||
RERANKING_MODEL_NAME, | ||
RERANKING_OUTPUT_DOCUMENTS, | ||
RERANKING_TOP_K, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question on this parameter: https://docs.cohere.com/docs/reranking
I'm guessing this is the same as TOP_N? If you feed say 5 documents but pass top_k of 3, does it only rank 3? Just trying to understand why this is a parameter to the rerank model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same as TOP_N. (The letter is K
in literature because N
is usually the total number of docs.) The caller of the reranker usually just wants a relatively small number of docs out of a initial set of tens or hundreds. It's certainly optional because it can just rank each document, but in general, a reduction in number is expected for each stage of the retrieval process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still confused though - if I pass say 5 documents with top_k
of 3 - does it rank 5 and trim the last two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it retains up to K
in the output, so in the case of top 3
, two docs of the lowest scores have to be dropped. Top K
is applied after ranking all 5 docs.
@@ -76,11 +75,7 @@ | |||
"if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n", | |||
" openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n", | |||
"openai.api_key = openai_api_key\n", | |||
"os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n", | |||
"\n", | |||
"if not (cohere_api_key := os.getenv(\"COHERE_API_KEY\")):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the internal one - it seemed nice to have a notebook that executed a reranker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, my bad. I had mistaken this one with the public notebook llama_index_tracing_tutorial.ipynb
which has a very similar name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolves #1153
Screen.Recording.2023-10-06.at.5.00.57.PM.mov